The refrence architecture in this chapter is from 
  https://www.tensorflow.org/text/tutorials/transformer
